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Abstract: Bayesian analysis is increasingly popular for use in social science and other application areas where 
the data are observations from an informative sample. An informative sampling design leads to inclusion prob¬ 
abilities that are correlated with the response variable of interest. Model inference performed on the observed 
sample taken from the population will be biased for the population generative model under informative sampling 
since the balance of information in the sample data is different from that for the population. Typical approaches 
to account for an informative sampling design under Bayesian estimation are often difficult to implement be¬ 
cause they require re-parameterization of the hypothesized generating model, or focus on design, rather than 
model-based, inference. We propose to construct a pseudo-posterior distribution that utilizes sampling weights 
based on the marginal inclusion probabilities to exponentiate the likelihood contribution of each sampled unit, 
which weights the information in the sample back to the population. Our approach provides a nearly automated 
estimation procedure applicable to any model specified by the data analyst for the population and retains the pop¬ 
ulation model parameterization and posterior sampling geometry. We construct conditions on known marginal 
and pairwise inclusion probabilities that define a class of sampling designs where L\ consistency of the pseudo 
posterior is guaranteed. We demonstrate our method on an application concerning the Bureau of Labor Statistics 
Job Openings and Labor Turnover Survey. 

Keywords and phrases: Survey sampling, Gaussian process, Dirichlet process, Bayesian hierarchical models. 
Latent models, Markov Chain Monte Carlo. 


1. Introduction 

Bayesian formulations are increasingly popular for modeling hypothesized distributions with complicated de¬ 
pendence structures. Their popularity stems from the ease of capturing this dependence by employing models 
with random effects parameters with a hierarchical construction that regulates the borrowing of information for 
estimation. Latent parameters are often used in the model to permit flexibility in the estimation of the depen¬ 
dencies among the observations (Dunson 2010). In social science applications, utilization of latent parameters 
may be useful for making inference about intrinsic belief states of people from their observed actions(see for 
example, Savitsky & Dalai (2013)) Other application areas in which latent parameters may be employed include, 
engineering and natural science, which use them to parameterize elements of an evolving process. 

Data used in these type of applications are often acquired through a complex sample design, resulting in 
probabilities of inclusion that are associated with the variable of interest. This association could result in an 
observed data set consisting of units that are not independent and identically distributed. A sampling design that 
produces a correlation between selection probabilities and observed values is referred to as informative. Failure 
to account for this dependence caused by the sampling design could bias estimation of parameters that index the 
joint distribution hypothesized to have generated the population ((Holt et al. 1980)). 

1.1. Examples 

We next outline some examples of survey instruments that employ informative sampling designs and associated 
inferential goals for models estimated on observed samples realized from these surveys. 

Example 1: The Survey of Occupational Illnesses and Injuries (SOU) is administered to U.S. business es¬ 
tablishments by the U.S. Bureau of Labor Statistics (BLS), in partnership with individual states, in order to 
capture workplace induced injuries and illnesses. A stratified sampling design is used where strata are indexed 
by state-industry-size-injury rate. Strata containing establishments that historically express higher injury rates 
are assigned higher sample inclusion probabilities. The resulting sample will contain a larger proportion of es¬ 
tablishments that express higher injury rates than the population, as a whole. States desire to perform regression 
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modeling with variable selection to discover the root causes that predict illnesses and injuries among the popu¬ 
lation of establishments, estimated from the observed sample. The model-estimated coefficients from the sample 
will be biased absent correction for over-representation of establishments that tend to express relatively high 
injury rates. 

Example 2: The Current Establishment Statistics (CES) is a BLS survey of U.S. business establishments 
that collects employment count data across states and industries under a stratified sampling design with strata 
indexed by the number of employees in each establishment. Strata containing relatively larger establishments are 
assigned higher inclusion probabilities than those which hold establishments with relatively fewer employees. 
The distribution of employment totals in the observed sample of establishments will be skewed towards relatively 
larger values as compared to the population of establishments. An important area of modeling inference is to 
understand industry-indexed differences in monthly employment trends and correlations among industries in the 
population. We would use a mixed effects model, parameterized with random effects indexed by industry and 
month. Estimation of the population distribution under our model from the observed sample will be biased absent 
some correction for the skewness in the sample towards larger-sized establishments. 

Example 3: BLS collects establishment-indexed employment totals in both the Quarterly Census of Employ¬ 
ment and Wages (QCEW) and the CES survey. CES survey participants also provide submissions to the QCEW, 
such that their reported monthly employment totals for an overlapping time period of interest should be equal 
between the two instruments, but they are not for approximately 10000 establishments, indicating one or more 
employment count submission errors for those respondents. A response variable of interest, termed the “error 
time series”, was created by taking the absolute value of the difference in reported employment totals among 
the 10000 establishments for each month over a 12 month period. A “response analysis survey” (RAS) of ap¬ 
proximately 2000 establishments was taken from this population with the goal to understand the process drivers 
for committing errors so that BLS may target resources to establishments that mitigate them. The modeling fo¬ 
cus is to identify probabilistic clusters of establishments with similar error patterns over the 12 month period 
and to examine the process by which establishments in each cluster construct their data submissions to BLS. The 
RAS survey design stratified the population of 10000 establishments based on phenomena of interest expressed in 
portions of each time series; for example, a big jump in the reported difference at year-end may indicate establish¬ 
ments who count checks that include regular pay and bonuses for each employee, instead of counting employees. 
Higher inclusion probabilities were assigned to those strata expressing phenomena of relatively greater interest to 
BLS researchers. Modeling the number of and memberships in probabilistic clusters of error patterns expressed 
in the population from the RAS sample may be biased because the proportions of error patterns expressed in the 
sample are designed to be different from the population. 

Example 4: The Current Expenditure (CE) survey is administered to U.S. households by BLS for the purpose 
of determining the amount of spending for a broad collection of goods and service categories and it serves as 
the main source used to construct the basket of goods later used to formulate the Consumer Price Index. The 
CE employs a multi-stage sampling design that draws clusters of core-based statistical areas (CBSAs), such 
as metropolitan and micropolitan areas, from which Census blocks and, ultimately, households are sampled. 
Economists desire to model the propensity or probability of purchase for a variety of goods and services. The 
balance of sampled clusters may not be reflective of those in the population; for example, if particularly high 
income ares are included in the sample. So inference on purchase propensities for the population made from the 
observed sample will be biased absent correction for the informative sampling design. 

Example 5: BLS administers the Job Openings and Labor Turnover survey (JOLTS) to business establish¬ 
ments with the focus to measure labor market dynamics by reporting the number of job openings, hires and 
separations, which is a leading indicator for employment trends. The sampling design assigns larger inclusion 
probabilities to establishments with relatively more employees because larger establishments drive the variance 
in the reported statistics. Our modeling goals are to understand differences in labor force dynamics based on 
employment ownership (e.g., private, public) and region as part of imputing missing values with respect to the 
population generating distribution. As with the CES sampling design, however, our sample will tend to over¬ 
represent relatively larger-sized establishments, so that inference and imputation using the sample will be biased 
for the population. We develop a multivariate count data population generating model in Section 4, where we 
illustrate the resulting estimation bias from failure to account for the correlations between assigned inclusion 
probabilities and the response variables of interest for our sample. 

The target audience for this article are data analysts who wish to perform some distributional inference using 
data obtained from an informative sample design on a population using a model they specify, p (y, | A), As A, for 
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density, p. We discuss, in the next section, how the limited literature on this topic does not adequately provide a 
general method for making distributional inference on a population while adjusting for the unequal probabilities 
of selection. 

In this article, we propose an approach that replaces the likelihood with the “pseudo” likelihood (Chambers & 
Skinner 2003), p (y;|5,- = 1, A) w ', using sampling weight, w,- l/m. This re-weights the likelihood contribution 
for each observed unit with intent to re-balance the information in the observed sample to approximate the balance 
of information in the target finite population; correcting for the informativeness. We show that the proposed 
method for Bayesian estimation on complex sample data allows for asymptotically consistent inference on any 
population-generating model specified by the data analyst. 

Additionally, this method does not require information about the complex design, other than the probabili¬ 
ties of selection, or about the full population, other than the observed data. We believe this makes the method 
applicable to more situations. Indeed, it is often the case that the data analyst does not have access to the full de¬ 
sign information or auxiliary variables on the population, zi, ■ ■ ■,Zn , used to assign the probabilities of selection 
K\..... 7T ; y. However, it is common for the probabilities of selection for the units in the sample, Jl\....,K n , to be 
provided with the observed sample data. 

1.2. Review of Methods to Account for Informative Sampling 

One current approach is to account for the informativeness by parameterizing the sampling design into the model 
(Little 2004). Parameterizing even a simple informative design is often difficult to accomplish and may disrupt 
desired inference by requiring a change to the underlying population model parameterization. The analyst in 
Example 3, above, desires to perform inference on an a priori unknown clustering of sampled units with their 
population model for data acquired under a stratified sampling design. Specifying random effects to be indexed 
by strata will likely conflict with the identification and composition of inferred clusters. Further, the data an¬ 
alyst may not have access to the sampling design, but only indirect information in form of sampling weights. 
Lastly, the analyst is sometimes required to impute the unobserved units in the finite population, which may be 
computationally infeasible. 

Another approach incorporates the sampling weights into inference about the population, as is our intent, 
but requires a particular form for the likelihood that does not allow the analyst to impose their own popula¬ 
tion model formulation of inferential interest. For example, Dong et al. (2014) specifies an empirical likelihood, 
while Kunihama et al. (2014) constructs a non-parametric mixture for the likelihood and Rao & Wu (2010) uses 
a sampling-weighted (pseudo) empirical likelihood. All of these approaches impose Dirichlet distribution priors 
for the mixture components with hyperparameters specified as a function of the first-order sampling weights. Si 
et al. (2015) regress the response variable on a Gaussian process function of the weights for sampling designs 
where sub-groups of sampled units have equal weights (e.g., a stratified sampling design). These approaches are 
designed for inference about simple mean and total statistics, rather than inference for parameters that character¬ 
ize an analyst-specified population model that is the focus for our proposed method. 

One method that uses a plug-in estimator, as do we in our method, is to construct a joint likelihood of the 
population distribution and sample inclusion in a simple logistic regression model (Malec et al. 1999). This 
allows one to analytically marginalize over the parameters indexed by the non-sampled units. This approach is 
limited in application to a class of simple population models that permit analytic integration and may not be 
applied to more general classes of Bayesian models for the population that we envision in development of our 
approach. 

Perhaps the most general Bayesian approach constructs models to co-estimate parameters for conditional 
expectations of inclusion probabilities jointly with the population-generating model parameters at each level of a 
hierarchical construction (Pfeffermann et al. 2006). This formulation is fully Bayesian such that it accounts for all 
sources of uncertainty in population generation and inclusion of units, but requires a custom implementation of 
an MCMC sampler for each specified population model, such as their simple two-level linear regression model. 
The implementations may increase the complexity of the specified model and reduce the quality of posterior 
mixing in the MCMC, so that they are suitable for relatively simple population probability models. 

The method we propose is intended to allow Bayesian inference from any population model that may be 
specified by the the data analyst under an informative sampling design, unlike the alternative methods. It provides 
asymptotically unbiased estimation using only the distribution for the observed sample units and normalized 
Hajek-like sampling weights. The “plug-in” type method accounts for the informative sampling design by raising 
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the likelihood contribution of each sampled observation to the power of their associated sampling weight. The 
implementation of the plug-in procedure for Bayesian estimation multiplies the sampling weight into each full 
conditional log-posterior density. This can then sampled in the typical sequential scan MCMC. 

Unlike these other methods that are prominent in the literature, this method: 1.) does not impose a population 
model (implicitly or explicitly), unlike the most recently-developed methods (Dong et al. 2014, Kunihama et al. 
2014, Rao & Wu 2010, Si et al. 2015); 2.) requires only the sampling weights and does not require parameterizing 
the sampling design unlike Little (2004); 3.) does not require a customized MCMC sampling procedure unlike 
Pfeffermann et al. (2006), so can be done automatically; 4). does not require imputing the non-sampled units in 
the finite population. Our data application and estimation model in the sequel are intended to be representative 
of common problems for Bayesian inference, and the application data are not readily estimated with these other 
methods that account for informative sampling. 

We formulate the pseudo-posterior density as sampling weight-adjusted plug-in from which we conduct model 
inference about the population under a dependent, informative sampling design in Section 2. Conditions are 
constructed that guarantee a frequentist L\ contraction of the pseudo posterior distribution on the true generating 
distribution in Section 3. We make an application of the pseudo posterior estimator to construct a regression 
model for count data using a dataset of monthly job hires and separations collected by the U.S. Bureau of Labor 
Statistics in Section 4. We reveal large differences for parameter estimates between incorporation versus ignoring 
the sampling weights. This section also includes a simulation study that compares the pseudo posterior estimated 
on the observed sample to the posterior estimated on the entire finite population. The paper concludes with a 
discussion in Section 5. The proofs for the main result, along with two enabling results are contained in an 
Appendix. 


2. Method to account for Informative Sampling 

We begin by constructing the pseudo likelihood and associated pseudo posterior density under any analyst- 
specified prior formulation on the model, A £ A. 


2.1. Pseudo Posterior 


Suppose there exists a Lebesgue measurable population-generating density, K (y| A), indexed by parameters, A £ 
A. Let <5, £{0,1} denote the sample inclusion indicator for units i = 1 ..... A' from the population under sampling 
without replacement. The density for the observed sample is denoted by, 7r(y„|A) = K (y| d, = 1, A), where “o” 
indicates “observed”. 

The plug-in estimator for posterior density under the analyst-specified model for A £ A is 


7t (A|y 0 ,w) 


Ur(y,^r 

i= 1 


7T(A), 


( 1 ) 


where n?=i 7^ (^o./I-^) VV ' denotes the pseudo likelihood for observed sample responses, y a . The joint prior den¬ 
sity on model space assigned by the analyst is denoted by /r(A). This pseudo likelihood employs sampling 
weights, {vt>/ 1 /7T,}, constructed to be inversely proportional to unit inclusion probabilities. Each sampling 
weight assigns the relative importance of the likelihood contribution for each sample observation to approximate 
the likelihood for the population. We use ft to denote the noisy approximation to posterior distribution, 7 r, and we 
make note that the approximation is based on the data, y 0 , and sampling weights, {w}, confined to those units 
included in the sample, S. 

The total estimated posterior variance is regulated by the sum of the sampling weights. We define unnor¬ 
malized weights, {wi = 1 / K ,}, and subsequently normalize them, vv, = -^L-, i = 1,... ,n, to sum to the sample 

n 

size, n, the asymptotic units of information in the sample. Incorporation of the sampling weights to formulate 
the pseudo posterior estimator is expected to increase the estimated parameter posterior variances relative to the 
(unweighted) posterior estimated on a simple random (non-informative) sample because the weights encode the 
uncertainty with which samples represent the finite population under repeated sampling. This increase in esti¬ 
mated posterior variance may be partly or wholly offset to the extent that the informative sampling design is 
more efficient than simple random sampling; for example, a stratified sampling design that takes simple random 
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samples within each stratum may produce samples that provide better coverage of the population. Although our 
method utilizes the weights as a “plug-in”, rather than imposing a prior, Pfeffermann & Sverchkov (2009) use 
Bayes rule to demonstrate one may replace the weights with their conditional expectation given the observed 
response to correct for informative sampling. Replacing the raw weights with their conditional expectation given 
the observed response may serve to reduce the total variation attributed to weighting (and the resulting posterior 
uncertainty) in the case where the actual sampled observations express information in different proportions than 
intended in the sampling design. Even though the conditional distribution of the weights given the response is 
generally different for the observed sample than for the population, nevertheless their conditional expectations 
are equal. 


3. Pseudo Posterior Consistency 


We formulate a pseudo posterior distribution in this section and specify conditions under which it contracts on 
the true generating distribution in L\. Let v £ Z + index a sequence of finite populations, {f/ v } v =t ; v v , each of 
size, \U V \ = N v , such that N v < Ny , for v < v , so that the finite population size grows as v increases. Suppose 
that X v ,i,... ,X v ,jv v are independently distributed according to some unknown distribution P. (with density, p) 
defined on the sample space, ( SC ,.<//). If II is a prior distribution on the model space, (,:f > . r if ) to which P is 
known to belong, then the posterior distribution is given by 


n(B|x 1 ,...,x^ v ) 


.f^n£i ^(x,0dn(p) 


( 2 ) 


for any B £ c £, where we refer to {X V: ,}/=r... i v v as {X;},- = i for readability when the context is clear. 

Ghosal & van der Vaart (2007) study the rate at which this posterior distribution converges to the assumed 
true (and fixed) generating distribution Pq. They prove, under certain conditions on the model space, PP, and 
the prior distribution, n, that in Po~ probability, the posterior distribution concentrates on an arbitrarily small 
neighborhood of Pq as N v f °°- 

The observed data on which we focus is not the entire finite population, Xi,...,Xyv v , but rather a sample, 
Xi,...,X„ v , with 7i v < N v , drawn under a sampling design distribution applied to the finite population under 
which each unit, i £ (1,... ,N V ), is assigned a probability of inclusion in the sample. These unit inclusion proba¬ 
bilities are constructed to depend on the realized finite population values, Xj,..., X,v v , at each v. 


3.1. Pseudo Posterior Distribution 

A sampling design is defined by placing a known distribution on a vector of inclusion indicators, 6 V = (<j v i,..., S v n v ), 
linked to the units comprising the population, U v . The sampling distribution is subsequently used to take an ob¬ 
served random sample of size n v < N v . Our conditions needed for the main result employ known marginal 
unit inclusion probabilities, 7r v , = Pr { 8 V i = 1} for all i £ U v and the second-order pairwise probabilities, K VlJ = 
Pr{<5 Vl ' = 1 fl 8 v j = 1} for i,j £ U v , which are obtained from the joint distribution over (5 v i,..., S v n v )- The de¬ 
pendence among unit inclusions in the sample contrasts with the usual iid draws from P. We denote the sampling 
distribution by P v . 

Under informative sampling, the marginal inclusion probabilities, 7T V , = P{8 vi = 1}, i £ (1,... ,N V ), are for¬ 
mulated to depend on the finite population data values, X,v v = (Xj,..., X,y v ). Since the resulting balance of in¬ 
formation would be different in the sample, the posterior distribution for (Xi 8 V i,.... X,y v 5 Vi y v ), that we employ 
for inference about Po, is not equal to that of Equation 2. 

Our task is to perform inference about the population generating distribution, Po, using the observed data taken 
under an informative sampling design. We account for informative sampling by “undoing” the sampling design 
with the weighted estimator, 

P n (X;5 V ,) := p (X,) 5 '"/*'" ,7 £ Uv, (3) 

which weights each density contribution, p(X,), by the inverse of its marginal inclusion probability. This con¬ 
struction re-weights the likelihood contributions defined on those units randomly-selected for inclusion in the 
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observed sample ({/ £ U v : S V i - 1}) to approximate the balance of information in U v . This approximation for 
the population likelihood produces the associated pseudo posterior. 


n*(fi|X 1 5vi,...,X JVv SvAr v ) 


/p 65 n^i P ^(X,8 v ,)dU(P) 

Spe^tx%^i)dn{P)' 


(4) 


that we use to achieve our required conditions for the rate of contraction of the pseudo posterior distribution on 
P 0 . We recall that both P and S v are random variables defined on the space of measures and possible samples, 
respectively. Additional conditions are later formulated for the distribution over samples, P v , drawn under the 
known sampling design, to achieve contraction of the pseudo posterior on Pq. We assume measurability for 
the sets on which we compute prior, posterior and pseudo posterior probabilities on the joint product space, 
ST x SP . For brevity, we use the superscript, 7r, to denote the dependence on the known sampling probabilities, 
{n V i}i=i,...,N v > f° r example, IT 1 (BlX^vi,.. .,X Nv 8 vNv ) := n(ZJ| (Xi5 v i,.. .,X Nv 8 vNv ), (7T v i,... ,JTvJVv))- 

Our main result is achieved in the limit as v f °°, under the countable set of successively larger-sized popula¬ 
tions, {U v } ve z+. We define the associated rate of convergence notation, &(b v ), to denote limy^ = 0. 


3.2. Empirical process functionals 


We employ the empirical distribution approximation for the joint distribution over population generation and the 
draw of an informative sample that produces our observed data to formulate our results. Our empirical distri¬ 
bution constmction follows Breslow & Wellner (2007) and incorporates inverse inclusion probability weights, 
{1 /7T V i};=i jv v , to account for the informative sampling design. 


1 Nv 8 

p w = ^L^(xo, 


V (=1 


Tt V i 


(5) 


where 8 (X,-) denotes the Dirac delta function, with probability mass 1 on X, and we recall that N v = \U V \ 
denotes the size of of the finite population. This construction contrasts with the usual empirical distribution, 
P,y v = f ! 8 (X,-), used to approximate P £ fiP, the distribution hypothesized to generate the finite population, 
Uv 

We follow the notational convention of Ghosal et al. (2000) and define the associated expectation functionals 
with respect to these empirical distributions by P^ Yf=\ |^/(X,). Similarly, P# v / = 

Lastly, we use the associated centered empirical processes, <G^ v = y/1% (Pjy ,-fl)) and <&n v = y/Nv (P;v v —Po). 

The sampling-weighted, (average) pseudo Hellinger distance between distributions, Pi ,P 2 ^ d^ 2 {p 1 , P 2 ) — 

1 

^.d 2 (pi(Xi),p 2 (Xj)), where d(pi,p 2 ) = f (y/pT- y/pi) 2 dp 1 (for dominating measure, p). We 
need this empirical average distance metric because the observed (sample) data drawn from the finite popula¬ 
tion under P v are no longer independent. The implication is that our result apply to finite populations generated 
as inid from which informative samples are taken. The associated non-sampling Hellinger distance is specified 
with, df {pi,p 2 ) = d 2 (. pi(X i ),p 2 (X i )). 


3.3. Main result 

We proceed to construct associated conditions and a theorem that contain our main result on the consistency of 
the pseudo posterior distribution under a class of informative sampling designs at the true generating distribution, 
P Q . Our approach extends the main in-probability convergence result of Ghosal & van der Vaart (2007) by adding 
new conditions that restrict the distribution of the informative sampling design. Suppose we have a sequence, 
^iv v 1 0 and N v £,f t 00 an d ”v^ v t 00 as v G t 00 an d an Y constant, C > 0, 

(Al) (Local entropy condition - Size of model) 

sup log A (^/36,{P £ &> Nv : d Nv {P,P 0 ) < %},d Nv ) <N V ^ , 
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(A2) (Size of space) 

n(,n^v v ) < exp ( 2(1 +2C))) 

(A3) (Prior mass covering the truth) 


n ( p : -Poiog — < $ v nPo 

P 0 


log^ 

Po. 


1 2 


^ 4 N v 


> exp (-N v $n v C) 


(A4) (Non-zero Inclusion Probabilities) 


sup 

v 


1 


min n V j 

ieu v 


< y, with /^--probability 1. 


(A5) (Asymptotic Independence Condition) 


limsup max 

v^oo 


K vij 
n vi itv j 


&[N V '), with Pq— probability 1 


such that for some constant, C 3 > 0 , 


A v sup max 

v i^jeu v 


wij 


< C 3 , for A v sufficiently large. 


K v i 7Ty j _ 

(A 6 ) (Constant Sampling fraction) For some constant, / £ (0,1), that we term the “sampling fraction”. 


lim sup 

V 


II v 

Nv 



= ^( 1 ), with Pq— probability 1 . 


Condition (Al) denotes the logarithm of the covering number, defined as the minimum number of balls of radius 
5 /36 needed to cover {/ J £ v v : c/,v v (P, Po) < C } under distance metric, t/,v v • This condition restricts the growth 
in the size of the model space, or as noted by Ghosal et al. (2000), the space, , must be not too big in order that 
the condition specifies an optimal convergence rate (Wong & Shen 1995). This condition guarantees the existence 
of test statistics, (j)„ v (X[ <5 V .|,..., \ Nv <5 v ,,y v ) £ (0,1), needed for enabling Lemma B.l, stated in the Appendix, 
that bounds the expectation of the pseudo posterior mass assigned on the set {P £ &N V ■ d nv (P,P 0 ) > &„}. 
Condition (A3) ensures the prior, n, assigns mass to convex balls in the vicinity of Po. Conditions (Al) and (A3), 
together, define the minimum value of <^v v , where if these conditions are satisfied for some £,n v , then they are also 
satisfied for any c, > <^y v - Condition (A2) allows, but restricts, the prior mass placed on the uncountable portion 
of the model space, such that we may direct our inference to an approximating sieve, /^,y v . This sequence of 
spaces “trims” away a portion of the space that is not entropy bounded (in condition (Al)). In practice, trimming 
the space may usually be performed to ensure the entropy bound. 

The next three new conditions impose restrictions on the sampling design and associated known distribution, 
P v , used to draw the observed sample data that, together, define a class of allowable sampling designs on which 
the contraction result for the pseudo posterior is guaranteed. Condition (A4) requires the sampling design to 
assign a positive probability for inclusion of every unit in the population because the restriction bounds the 
sampling inclusion probabilities away from 0. Since the maximum inclusion probability is 1, the bound, y > 1. 
No portion of the population may be systematically excluded, which would prevent a sample of any size from 
containing information about the population from which the sample is taken. Condition (A5) restricts the result to 
sampling designs where the dependence among lowest-level sampled units attenuates to 0 as v f °°; for example, 
a two-stage sampling design of clusters within strata would meet this condition if the number of population units 
nested within each cluster (from which the sample is drawn) increases in the limit of v. Such would be the case 
in a survey of households within each cluster if the cluster domains are geographically defined and would grow 
in area as v increases. In this case of increasing cluster area, the dependence among the inclusion of any two 
households in a given cluster would decline as the number of households increases with the size of the area 
defined for that cluster. Condition (A 6 ) ensures that the observed sample size, n v , limits to °° along with the size 
of the partially-observed finite population, A v . 
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Theorem 3.1. Suppose conditions (Al )-(A6) hold. Then for sets ,>fy v C £?, constants, K > 0, and M sufficiently 
large, 


(P : dff v (P,Po) >M^ fv \XiS vl ,...,X Ny Sy Ny ) < 

16 r[r+c 3 ] 


Kn v £, 2 


, + 5yexp - 
(Kf+l-2f) 2 NM \ 2 y 


<N V 


( 6 ) 


which tends to 0 as ( n Vl N v ) f °°. 

We note that the rate of convergence is injured for a sampling distribution, P v , that assigns relatively low 
inclusion probabilities to some units in the finite population such that / will be relatively larger. Samples drawn 
under a design that expresses a large variability in the sampling weights will express more dispersion in their 
information similarity to the underlying finite population. Similarly, the larger the dependence among the finite 
population unit inclusions induced by P v , the higher will be Cy and the slower will be the rate of contraction. 

The separability of the conditions on & and Id (P), on the one hand, from those on the sampling design 
distribution, P v , on the other hand, coupled with the sequential process of taking the observed sample from the 
finite population reveal that the pseudo posterior, defined on the partially-observed sample from a population, 
contracts on !\) through converging to the posterior distribution defined on each fully-observed population. We 
demonstrate this property of the pseudo posterior in a simulation study conducted in Section 4.1. By contrast, if 
the posterior distribution, defined on each fully-observed finite population, fails to meet conditions (Al), (A2) 
and (A3) for the main result from Equation 6, such that it fails to contract on I]), then the associated pseudo 
posterior cannot contract on Pq, even if the sampling design satisfies conditions (A4), (A5) and (A6). 

The proof generally follows that of Ghosal et al. (2000) with substantial modification to account for informa¬ 
tive sampling. The L\ rate of contraction of the pseudo posterior distribution with respect to the joint distribution 
for population generation and the taking of informative samples is derived. Our approach includes two unique 
enabling results. Please see Appendix sections A and B for details. 


4. Application 

We construct a model for count data and perform inference on survey responses collected by the Job Openings 
and Labor Turnover Survey (JOLTS), introduced in Example 5 of Section 1.1, which is administered by BLS on a 
monthly basis to a randomly-selected sample from a frame composed of non-agricultural U.S. private (business) 
and public establishments. JOLTS focuses on the demand side of U.S. labor force dynamics and measures job 
hires, separations (e.g. quits, layoffs and discharges) and openings. The JOLTS sampling design assigns inclu¬ 
sion probabilities (under sampling without replacement) to establishments to be proportional to the number of 
employees for each establishment (as obtained from the Quarterly Census of Employment and Wages (QCEW)). 
This design is informative in that the number of employees for an establishment will generally be correlated with 
the number of hires, separations and openings. We perform our modeling analysis on a May, 2012 data set of 
n = 8595 responding establishments. 

We begin by specifying a finite population regression probability model from which we formulate the sampling- 
weighted pseudo posterior joint distribution that we use to make inference on model parameters from the popu¬ 
lation generating distribution with only the observed sample of a finite population. We demonstrate that failing 
to incorporate sampling weights (e.g. by estimating the posterior distribution defined for the finite population on 
the observed sample) produces large differences in estimates of parameters. 

Our regression model defines a multivariate response as the number of job hires (Hires) for the first response 
variable and total separations (Seps) as the second response variable. We construct a single multivariate model (as 
contrasted with the specification of two univariate models) because these variables of interest tend to be highly 
correlated such that we expect the regression parameters to express dependence; for example, these two variables 
are correlated at 60% in our May 2012 dataset. 

We formulate a model for count data that accommodates the high degree of over-dispersion expressed in our 
establishment-indexed multivariate responses due to the large employment size differences across the establish¬ 
ments. Were we working with domain-indexed (e.g., by state or county) responses, we may consider to use a 
Gaussian approximation for the count data likelihood, but such is not appropriate for us due to the presence of 
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many small-sized establishments. The modeling of count data outcomes is very typical for the analysis of BLS 
survey data for establishments focused on (un)employment. 

We specify the following count data model for the population. 


yu 

Pois (exp ( y/,y )) 


(7) 

NxD 

NxDPxD 

/ DxD\ 



~ X B +^4/v X fl 

(ljv,A _1 ) 

(8) 


/ PxP 

\ / 

, [TbA] - 1 j 
/ 


B 

~0 + ^PxD M-^ 
\ 

(9) 

A 

~*^((D+1),I d ) 


(10) 

%B 



(11) 

M 

~W P ((P+ 1),Ip), 


(12) 


where i = 1..... /V indexes the number of establishments and d = 1..... /f indexes the number of dimensions for 

/Ox 1 


the multivariate response, Y. The N x D log-mean, 'P = tpi ■ ■ ■ ■ ; Vfv - ma y be viewed as a latent response 


whose columns index the number of job hires (Hires) and total separations (Seps) under our JOLTS application, 
so that D = 2. The number of predictors in the design matrix, X, is denoted by P and B are the unknown matrix 
of population coefficients that serve as the focus for our inference. Our model is formulated as a multivariate 
Poisson-lognormal model, under which the Gaussian prior of Equation 8 for the logarithm of the Poisson mean 
allows for over-dispersion (of different degrees) in each of the D dimensions. The priors in Equation 8 and 
Equation 9 are formulated in matrix variate (or, more generally, tensor product) Gaussian distributions using 
the notation of Dawid (1981); for example, the prior for the P x D matrix of coefficients, B, assigns the P xD 
mean 0 for a Gaussian distribution that employs a separable covariance structure where the PxP, M, denotes the 
precision matrix for the columns of B, and the DxD, T/;A, denotes the precision matrix for the rows. This prior 
formulation is the equivalent of assigning a PD dimensional Gaussian prior to a vectorization of B accomplished 
by stacking its columns with PD x PD precision matrix, IVI0 (T/,'A). (See Hoff (2011) for more background). 
Precision matrices, (M, A), each receive Wishart priors with hyperparameter values that impose uniform marginal 
prior distributions on the correlations (Barnard et al. 2000). 

We regress the multivariate latent response, l P, on predictors representing the logarithm of the overall establishment- 
indexed number of employees (Emp), obtained from the QCEW, the logarithm of the number of job openings 
(Open), region (Northeast, South, West, Midwest (Midw)) and ownership type (Private, Federal Government, 

State Government (State), Local Government (Local)). We convert region and ownership type to binary indicators 
and leave out the Northeast region and Federal Government ownership to provide the baseline of a full-column 
rank predictor matrix. We summarize our regression model on the logarithm scale by: (i^, y/ Seps ) ~ 1 + West 
+ Midw + South + State 4 - Local + Private + log(Emp) + log(Opens) + error, where 1 denotes an intercept 
(Int). 

Our population model is hypothesized to generate the finite population of the U.S. non-agricultural establish¬ 
ments, from which we have taken a sample of size n = 8595 for May, 2012 as our observations. For ease of 
reading, we will continue to use Y and X, to next define the associated pseudo posterior, though each possesses 
n <N rows representing the sampled observations, in this context. 

The population model likelihood contribution for establishment, i, on dimension, d, is formed with the inte¬ 
gration, 

p(y ; -d|x,',B,A) = Jp(y id Wid) xp(\if id \xi,R,A)d\i/ id , (13) 

R 


where sampling weight, w,- = 1 / zr,- and w< = nx w,/w,-, such that the adjusted weights sum to n, the asymp¬ 
totic amount of information contained in the sample (under a sampling design that obeys condition (A5)). This 
integrated likelihood induces the following pseudo likelihood. 


P Jt Cy/</|x/ I B,A) = 


P C ya I Yid ) X p (Yid |x«, B , A) d yf id 


(14) 
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which is analytically intractable, so we perform the integration, numerically, in our MCMC using the prior for 
each \jfid exponentiated by the normalized sampling weight, w,-, which we use to construct its pseudo posterior 
distribution. Using Bayes rule we present the logarithm of the pseudo posteriors for the latent set of D x 1 log- 
mean parameters, {■*/?,}, (which are a posteriori independent over ,ri), with. 


log p K (t/j/ly/jX/.B, A) °c 


log- 


D 


n ex P ( Vid Y id exp (- exp ( Yid )) 


d= I 
D 


Wi 

/ 

x^A- 1 ) 

W[ 

X 

(ipi 

y 


\ 

/ 

1 


D 1 / / \ / / 

Wi Y \yidWid - exp {\jf id )] - - [it'pi - x'b J WfA 

d= 1 ~ 


x,B , , 


(15a) 

(15b) 

(15c) 


where we note in the second expression in Equation 15c that the sampling weights influence the prior precision 
for each i such that a higher-weighted observation will exert relatively more influence on posterior inference 
because this observation is relatively more representative of the population. We take samples from the pseudo 
posterior distribution specified Equation 15c in our MCMC using the elliptical slice sampler of Murray et al. 
(2010), where we draw ?/>,■ jY d (x,B , (w,-A) 1 j and formulate a proposal as a convex combination (parameter¬ 
ized on an ellipse) of this draw from the prior and the value selected on the previous iteration of the MCMC. We 
evaluate each proposal using the weighted likelihood in the first expression of Equation 15c. 

We next illustrate the construction of the pseudo posterior distribution for the P x D matrix of regression 
coefficients, B, (which by D-separation is independent of the observations, ((y,d), given (y/,v/)). 


p n (B|Y,X, v P,A,M,Tb) « 


n^xZj(^iB'x / ,I„ ) A^ 1 ) M ’ i ,/f£ x0 (b|M“\ (tbA)- 1 ) (16a) 

i= 1 


log p n (B|Y,X,'F,A,M ) T|,) « Y ylog|A|-’|(^-B'x i ) A 

i=l L 

+ log JVp^D (bIM-'^TbA) -1 


B Xj 


(16b) 


In a Bayesian setting, the sum of the weights (n = Jj” =1 w;) impacts the estimated posterior variance as we observe 
in Equation 16b. We see that weights scale the quadratic product of the Gaussian kernel in Equation 16b such 
that we may accomplish the same result using the matrix variate formation to define the pseudo likelihood, 
^Kixd OE-XBIW^- 1 ), where W = diag (vi>i, - - -,the weights for the sampled observations, from which 
we compute the following conjugate conditional pseudo posterior distribution defined on the n observations, 

P K (B|Y,X,^,A,M,t b ) = K + ^pxd (B|(02) _i ,A _1 ) , (17) 

where = X'WX + t b M and hjj = (02) _I X'WF. 

Under employment of a simpler continuous response framework, the conditional posterior for B retains the 
same form as Equation 17, except the latent response on the logarithm scale, fl 1 , would be replaced by the 
observed data, Y. Intuitively, we note using a sampling-weighted pseudo prior for the latent response, fl 1 , for 
sampling coefficients, B, is analogous to using the sampling-weighted likelihood in the case of an observed, 
continuous response, Y. 

Each plot panel in Figure 1 compares estimated posterior distributions for a coefficient in B (within 95% cred¬ 
ible intervals), labeled by “predictor, dimension (of the multivariate response)”, when applied to the May, 2012 
JOLTS dataset between two estimation models: 1. The left-hand plot in each panel employs the sampling weights 
to estimate the pseudo posterior for B, induced by the pseudo posterior for the latent response in Equation 15c; 
2. The right-hand plot estimates the coefficients using the posterior distribution defined on the finite population, 
which may be achieved by replacing W by the identity matrix to equally weight establishments. Equal weighting 
of establishments assumes that the sample represents the same balance of information as the population from 
which it was drawn, which is not the case under an informative sampling design. Comparing estimation results 
from the pseudo posterior and population posterior distributions provides one method to assess the sensitivity of 
estimated parameter distributions to the sampling design. 
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Fig 1: Comparison of posterior densities for the each coefficient in the (P = 9) x (D = 2) coefficient matrix, B, 
within 95% credible intervals, based on inclusion sampling weights in a pseudo posterior (the left-hand plot in 
each panel) and exclusion of the sampling weights using the posterior distribution defined for the population (in 
the right-hand plot). Each plot panel is labeled by “predictor,response” for the two included response variables, 
“Hires”, and “Seps” (total separations). 


We observe that the estimated results are quite different in both location and variation between estimations 
performed under the pseudo posterior and population posterior distributions, indicating a high degree of informa¬ 
tiveness in the sampling design. The 95% credible intervals for the coefficients of the continuous predictors - (the 
log of) job openings (Opens) and employment (Emp) - don’t even overlap on both the number of hires (Hires) 
and separations (Seps) responses. The coefficient for the State ownership predictor and the number of hires re¬ 
sponse is bounded away from 0 when estimated under the (unweighted) population posterior, but is centered on 
0 under the sampling-weighted, pseudo posterior. The coefficient posterior variances estimated on the observed 
sample under the population posterior are understated because they don’t reflect the uncertainty with which the 
information in the sample expresses that in the population (which is captured through the sampling weights). 

4.1. Simulation Study 

We implement a simulation study to compare the marginal pseudo posterior distributions to the (unweighted) 
population posterior distributions for the regression coefficients, where both are estimated on the observed sample 
drawn under an informative sampling design. For this study we use the N = 8595 observations from the JOLTS 
May, 2012 data as our population. We take 100 Monte Carlo samples of size n v = (500,1000,1500,2500) 
establishments using an informative single-stage sample design with unequal inclusion probabilities based on 
the proportional to size sample used for the real JOLTS survey. Characteristics of the the sampling design, used 
for this study, at each sample size are presented in Table 4.1 . 

This sampling design will induce distributions of the observed samples that will be different from those for 
the population. The designed correlation between the response and inclusion probabilities will produce observed 
samples with values skewed towards higher numbers of hires and separations than in the population. Figure 2 
demonstrates this difference between the distributions for realized samples under the informative sampling de- 
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n v 

CUs 

min(7r v ) 

max(7r v ) 

CV(^v) 

Cor(y hires , 7C V ) 

Cor(y Seps , Jtv) 

1 

500 

56 

0.02 

1.00 

2.11 

0.80 

0.62 

2 

1000 

196 

0.04 

1.00 

1.60 

0.69 

0.50 

3 

1500 

357 

0.07 

1.00 

1.29 

0.61 

0.44 

4 

2500 

722 

0.14 

1.00 

0.91 

0.51 

0.36 


Table 1: Characteristics of single stage, fixed size pps sampling design used in simulation study. n v denotes the 
sample size. CUs denotes the number of certainty units (with inclusion probabilities equal to 1). n v denotes the 
inclusion probabilities (proportional to square root of JOLTS employment), CV(7T v ) denotes the coefficient of 
variation of 71 v , Cor(y hires ,7T v ) denotes correlation of the number of hires and K v and Cor(y Seps ,7T v ) denotes the 
correlation of the number of separations and K v . 


sign compared to those for the finite population. The left-most box plot in each of the two panels displays the 
population distribution for a response value. A single sample is drawn under a sequence of increasing sample 
sizes for illustration. The next set of box plots displays the resulting distributions for the response values in each 
sample with size increasing from left-to-right. The left-hand plot panel displays the distributions for the Hires 
response, while the right-hand panel displays those for the Seps (separations) response variable. 

Pseudo posterior and population posterior distributions are estimated on each Monte Carlo sample at each 
sample size in n v . Figure 3 compares estimation of the posterior distribution from the fully-observed popu¬ 
lation (left-hand box plot) to estimation using the pseudo posterior from sample observations taken under the 
proportional-to-size sampling design. The third box plot in each panel shows the estimation of the posterior 
distribution estimated on the same sample ignoring the informative sampling design. The last box plot in each 
panel displays the estimates of the posterior distribution from a simple random sample of the same size, where 
no correction for the sampling design is required, as a gold standard against which to measure the performance 
of the pseudo posterior distribution. We estimate the distributions on each of the 100 Monte Carlo draws for each 
sample size and concatenate the results such that they incorporate both the variation of population generation and 
repeated sampling from that population. The sample sizes, n v , increase from left-to-right across the plot panels. 
The top set of plot panels display the posterior distributions of the regression coefficient for the employment pre¬ 
dictor (Emp) and the hires response (Hires), while the bottom set of panels display the coefficient distributions 
for the employment predictor (Emp) and the total separations response (Seps). 
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Sample Size 


Fig 2: Distributions of response values for population compared to informative samples. The left-most box plot 
in each of the two plot panels contains the distribution for the JOLTS sample that we use as our “population” in 
the simulation study. The next set of box plots show the distribution for the response values for increasing sample 
sizes (from left-to-right) for each sample drawn under our single stage proportion-to-size design. The left-hand 
plot panel displays the Hires response variable and the right-hand panel displays the Seps (separations) response 
variable. 
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Fig 3: Comparison of posterior densities for 2 coefficients, Employment-Hires (top row of plot panels) and 
Employment-Separation (bottom row of plot panels) in B, within 95% credible intervals, between estimation 
on the population (left-hand plot in each panel), estimations from informative samples data taken from that 
population, which include sampling weights in a pseudo posterior (the second plot from the left in each panel) and 
exclusion of the sampling weights using the population posterior distribution (the third plot from the left) under 
a simulation study. The right-most plot presents the posterior density estimated from a simple random sample 
of the same size for comparison. The simulation study uses the May, 2012 JOLTS sample as the “population” 
and generates 500 informative samples for a range of sample sizes (of 500,1000,1500,2500, from left-to-right) 
under a sampling without replacement design with inclusion probabilities set proportionally to the square root 
of employment levels. A separate estimation is performed on each Monte Carlo sample and the draws from 
estimated distributions are concatenated over the samples. 
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Scanning from left-to-right in each row across the increasing sample sizes, we readily note a consistent dif¬ 
ference in the estimated posterior mean, as expected, between the population model estimated on the samples 
without adjustment for the informative sampling design as compared to the mean of the posterior distribution es¬ 
timated on the entire finite population. The application of the pseudo posterior model, however, produces much 
less difference (relative to estimation on the fully observed population), though the difference between the esti¬ 
mated pseudo-posterior and the population posterior is yet notably more than that for the simple random sampling 
result (estimated on samples of the same size as the informative sample). The estimated difference for the pseudo¬ 
posterior converges to 0, however, as the sample size increases. The posterior variance for the estimated posterior 
under simple random sampling remains larger than that for the pseudo posterior estimated on the informative 
sample because our proportion-to-size sampling design over-samples the highest variance units, which provides 
better capture of information in the population (which is why this design is used). So, in this case, the improved 
capture of information in the finite population provided by our sampling design more than overcomes the added 
variation induced by estimation with the sampling weights. In summary, this simulation study demonstrates the 
contraction of the pseudo posterior distribution estimated on the sample onto the posterior distribution estimated 
on a fully-observed finite population. 

We were able to directly perform posterior inference about the population using only quantities available for 
the observed sample under the pseudo posterior full conditional distributions outlined in Section 4. By contrast. 
Little (2004) offer no modeling approach that parameterizes a proportion-to-size sampling design because they 
note that each unit is in its own group under the stratum-indexed construction they generally suggest. A typical 
naive approach, however, is to simply include the sampling weights or a variable highly corrected with them as a 
predictor only for observed units with no imputation of the non-sampled units. This is precisely the construction 
of the alternative that estimates the population posterior distribution on the informative sample, which is shown 
as the third box plot in each plot panel of Figure 3, because the employment variable, Emp, is included as 
a predictor and is highly correlated with the sampling weights. This option includes Emp only for the sampled 
units as does the model for the pseudo posterior. The reason for biased inference, even when including a predictor 
that is highly correlated with sampling weights, is because the distribution for the sampled data conditioned on 
the sampling weights is not generally equal to the distribution for the population conditioned on the sampling 
weights by Bayes rule (Pfeffermann & Sverchkov 2009). 

The JOLTS respondent-level data from which samples were drawn for our Monte Carlo simulation study 
may not be publicly released due to restrictions that protect confidentiality of survey participants. A Monte 
Carlo simulation study using our pseudo posterior plug-in method may, however, be generated under a Bayesian 
nonparametric model for functional data that is available from the growfunctions package for R (Savitsky 
2015). The package includes a synthetic data generator whose use is illustrated in Savitsky (2014), along with 
a Monte Carlo simulator that compares parameter estimates when correcting versus ignoring an informative 
sampling design. Although the functional data model is more complicated than our count data application, the 
Monte Carlo simulation function available in growfunctions produces a figure that demonstrates results very 
similar to those displayed in Figure 3. 

5. Discussion 

A variety of broadly applicable approaches are available for incorporating sampling weights into weighted maxi¬ 
mum likelihood estimation procedures (Pfeffermann & Sverchkov 2009) to account for an informative sampling 
design. Defining easily adaptable algorithms that account for sampling design informativeness under any model 
for the population specified by the data analyst has proved more challenging for estimating Bayesian probability 
models. Solutions have focused on parameterizing the sampling design or co-estimating the conditional expec¬ 
tation of inclusion, along with the population-generating model. While these approaches allow estimation using 
the sampled observations, the implementations typically require a high degree of customization to each popula¬ 
tion model. We take a different approach that constructs a sample-weighted pseudo-posterior to account for an 
informative sampling design in our plug-in method that is readily accommodated to any Bayesian population 
probability model. 

We demonstrated the applicability of the plug-in method to a poisson - lognormal model for count data. We 
showed that the plug-in method reduces estimation bias and posterior estimation includes the uncertainty with 
which the sample reflects the population on these covariance parameters. 
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The plug-in method is as easily-implemented and broadly applicable as those methods used for likelihood 
based optimization. We illustrated in Section 4 that the full conditional posterior distributions defined for the 
population generating model are easily updated by multiplying the log-likelihoods for ({y,,/}, y/, t /), by the sam¬ 
pling weights, {vt>,}, without changing the constructions for full conditional posterior distributions. The same 
concerns that apply in the use of sampling weights under likelihood optimization also apply for Bayesian estima¬ 
tion. The quality of posterior estimation is highly dependent on the population representativeness of the realized 
sample. Sampling weights may be adjusted based on the composition of the realized sample through estimat¬ 
ing the conditional expectation of the weights, given the response values for the observed units. Regressing the 
weights on the response variables using the observed data and replacing the raw weights with their conditional 
expectation, known as “weight smoothing”, would be expected to reduce the posterior variances for estimated 
parameters to the extent that the weights express variance unrelated to the response. While the conditional distri¬ 
bution for the sampling weights given the response under the realized sample is not generally expected to be the 
same as that for the finite population, their expectations are equal (Pfeffermann & Sverchkov 2009). We explored 
such smoothing for our sampling weights for the JOLTS application, but there was little reduction in variance, so 
we employed the published weights for simplicity. 

Even after adjustment, if the composition of the realized sample unevenly reflects information in the pop¬ 
ulation, the weights would express a high variation. Approaches that calibrate the sampling weights to actual 
population totals, where known, may improve the quality of estimation produced from the plug-in method. BLS 
performs a calibration adjustment of the JOLTS sampling weights such that the weighted difference of hires 
and separations reported in the sample ties to the monthly total employment change from the CES survey. (The 
CES survey is introduced and discussed in Section 1.1). This step adjusts the sampling weights computed from 
inclusion probabilities under the JOLTS sampling design based on the actual sample achieved in each month. 

One may, alternatively, implement a more fully Bayesian approach that parameterizes a joint model for the 
response and sampling weights, specific to a given population generating model, as a method that smoothes the 
weights in the presence of the response values. Doing so, however, requires imputation of weights and response 
values for non-sampled units, which can be computationally expensive for a survey that samples from the entire 
U.S. population of business establishments, as does JOLTS. 

Lastly, we construct conditions which, together, define a class of sampling designs under which L\ consistency 
of the pseudo posterior is guaranteed. One of these conditions requires that the pairwise sample inclusion depen¬ 
dencies asymptotically decrease to 0. While there are many sampling designs, in practice, which are members 
of this class, including the proportion-to-size sampling design used for our JOLTS application, there are some 
designs which are not - such as a cluster sampling design where the number of clusters grows, but the number 
of units in each cluster remains relatively fixed. A direction for future research will be to widen the class of 
allowable designs by incorporating second order (or pairwise) inclusion probabilities for inference, though doing 
so will introduce some practical restrictions on the specifications for the population generating model. 
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Appendix A: Proof of Theorem 3.3 

Proof. Condition (Al) establishes the existence of test statistics, (j>„ v (Xi <5 v i,... ,Xn v S v n v ) G (0,1) used to achieve 
the following result, 


E /bvPv 

< exp 0 ?v<?v v ) 


exp (-Kn v M 2 tf v ) 

1 - exp (-Kn v M 2 ^ y 


< 2 exp (— Kn v 


•Ny) > 


( 18 ) 


in Lemmas 2 and 9 of Ghosal & van der Vaart (2007) by setting 8, = and by choosing constant M > 0 

sufficiently large, such that KM 2 — 1 > K. 

We will bound the expectation (under (Po,P v ), jointly) of the mass assigned by pseudo posterior distribution 
for those P at some minimum distance from Pq, 


n n (P&0>: 4 v {P,P 0 ) > M^v v |Xi5 v t, • ■ • ,X Nv S vNv ) 

= U n (P G 9 : 4 v (RP 0 ) > M^ff v \XiSyi ,... ,X Nv S vNv ) (<j>„ v +1 - <M. (19) 

Equation 18 establishes the bound, 


E/’o.^n 71 (P & BP : d% v {P,Po) > MB, Nv |Xi5yi ,... ,X Nv Sn v ) 0 „ v < E/> o 0„ v < 

2exp (-Kiiv^y), (20) 

since the pseudo posterior mass is bounded from above by 1. We next enumerate the pseudo posterior distribution 
for the second term of Equation 19, 


n* (P £ ^ : 4 v (P,P 0 ) > Mt; Nv |Xjv v 5tv v ) (1 - </>„ v ) = 


Pe^:d^ v (P.P 0 )>M^ Nv 


f\^ i (X i 8 V i)dU(P)( l-fc v ) 
i=l ^0 


PeS* 


fl 1 -^ (X,8 vl )dU(P) 
i= t P o 


( 21 ) 


We may bound the denominator of Equation 21 from below, in probability. Define the event, 

= |p : -ftlog (^) < &,fl> (log £f< &} 

We have from Lemma B.2, 



for every P € /Lv v and any C > 0, y > 1, where y may be set closer to 1 for sampling designs that define a low 
gradient for inclusion probabilities, {7T V /}. The constant, C 3 > 0, and will be close to 1 for sufficiently large v. 
Condition ( A3) restricts the prior on /iy v ,, 

n(£, Vv ) > exp(-AV^ v C). 
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Then with probability at least 1 — 


i6 f[r+c 3 ] 


(KAfif-liyNvt, 1 ’ 


AV. p K 




l\B^ Xi 8 vi )dYl{P) > exp [-(1 +C)N v t; 2 } n (B Nv ) 
i= l P o 


> exp (—(1 + 2 C)N v t;- 

P2 \ 
* 


> exp — 


KM 2 n v tf 


2 7 


where we set 1 + 2 C = , where we use condition (A6) to replace f xN v with n v for v sufficiently large. 


Denote this event by. 


A 71 — 




^P K ( KM 2 n v B 

n * (*Ai) (P) > exp ^ 

i=t P o \ 


such that, 


4v (^o) > M^ Nv |X Wv ^ v ) (1 - <t> nv ) 

1 


{Pe3>:d* v {P,P 0 )>M!; Nv } 


Y[^{Xi8 vi )dn{p){ i-^v v ) 
1=1 Po 


Peg* 


f[ P l (XAddUiP) 

i=l f-’o 


(I(Kl)+'«)) 


— ^ ( [^ v ] ) + n (4v) X 

(KM~n v ^ v \ 

+ exp - 


exp 


' KM 2 n v ty 


•N v 


2 7 


V 2y 


/ 


n(^\^ v 


1=1 r’o 


{Pe& Nv :d% v (P,P 0 )>M$ Nv } 

Taking the expectation of both sides with respect to the joint distribution, (Po.P v ), 

Ero+n 1 (P S & : 4 V M) > M^ Nv |Xjv v <5tv v ) (1 - 0n v ) 


<^([4 v ]°)+exp ( 

' KM 2 n v ^ 2 v 


( KM 2 n v B 


+ exp 


2y 


W I67 2 [7+C3] 


(■ KM 2 f- 2 y) z n v ^ 

KM 2 n v Q y 


\ 27 

' E /b A 


+ exp - 


n(5»\5»* v ) 

/ 

{/>e^iv v :rfj v {p,p 0 )>m£ Nv } 

/ KM 2 n v Q\ 

27 ) 


+ exp 


27 


•E 




/ 


{pe^v v :<$ v (p,/b)>M&v v } 


n^(x,- 5 V i)rfn(p)(i- 0 „ 

1=1 Po 


fl ^ (XtSy^dn (p) (1 - 4>„ v ) 

1=1 fo 


( 22 ) 


( 23 ) 


where in (i) we have used condition (A 2 ) that bounds from above TI(£? > \£? > n v ), the prior mass assigned on the 
portion of the model space that lies outside the sieve, and have plugged in for constant, C. 
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By conditions (Al), (A4) and Lemma B.l, 
E P 0 Pv [ 


{Pe&’ Nv :d* v (P,P 0 )>Mi; Nv } 


fK (xAddnfPji]-^) 

i= 1 P 0 


< 2 yexp ^ 


-KM 2 n v & 


r 


Returning to the expectation in Equation 23, 

Ep 0 ,A n?r (P € & : 4 V (/’.fl)) > \X Nv 6 Nv ) (1 - 0„ v ) 
16y 2 [y+C 3 ] 

——|- exp |--- | -f exp 


< 


< 


(KM 2 — 2y) N v ^ 
16y 2 [y+C 3 ] 


KM 2 n v E, 

W 


( KM~n v ^ff 
+ exp - 


2 y 


• 2 yexp 


( KM n v ^ v 


(Kf-2yfN v ^ 


<Ny 


+ 3yexp 


KM 2 n v $ 2 v 
2 Y 


where in (i) we use our earlier stated bound, KM 2 — 1 > K —> KM 2 > K+ 1. 
Bringing all the pieces together, 

EroAn 1 (P G & : df, v (P,Po) > Mt; Nv |Xi5vi,... ,X Nv 8n v 


, , . 16y 2 fy+C 3 l ( KM 2 n v tfj 

< 2 exp (-Kn^U) + „ 7 . 2 ~ e2 + 3 ^ ex P-^r; ~ 


(Kf-2YYN v ^ 2 v 


2 y 


< 


16y 2 [y+C 3 ] 
(Kf-2Y) 2 N v ^ Nv 


5yexp 


Kn v E, 


'Ny 


2 y 


(24) 


(25) 


where y> 1 and C 3 > 0. The right-hand side of Equation 25 tends to 0 (as V f °°) in Pq probability. This concludes 
the proof. □ 


Appendix B: Enabling Lemmas 


We next construct two enabling results needed to prove Theorem 3.1 to account informative sampling under (A4), 
(A5) and (A 6 ). The first enabling result. Lemma B.l, extends the applicability of Ghosal & van der Vaart (2007) 
- Lemmas 2 and 9 for inid models to informative sampling without replacement. This result is used to bound 
from above the numerator for the expectation with respect to the joint distribution for population generation and 
the taking of the informative sample, (Pq,P v ), of the pseudo posterior distribution in Equation 4 on the restricted 
set of measures, {P € B}, where B = {P £ & : d Nv (P,P 0 ) > <5<^v v }, (for any 8 > 0). The restricted set includes 
those P that are at some minimum distance, 8 , from Pq under pseudo Hellinger metric, dfj . The second 
result. Lemma B.2, extends Lemma 8.1 of Ghosal et al. (2000) to bound the probability of the denominator of 
Equation 4 with respect to (Po, P v ) , from below. 

Lemma B.l. Suppose conditions (Al) and (A4) hold. Then for every E, > E,n v , a constant, K > 0, and any 
constant, 8 > 0, 


E, 


'Pq,Pv 


E />oA / fl 3F ( x Ai) dU (P) (1 - 

Pe&\& Nv 1=1 0 


/ f\ l ^(X l 8 vi )dH(P)(] -0„ v ) 

PetP Nv :d^{p.p 0 )>dY =1 ° 


< n(&\& Nv ) 

' -Kn v 8 2 E, 2 


< 2 yexp 
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(26) 


(27) 
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The constant multiplier, y> 1, defined in condition (A4), restricts the distribution of the sampling design by 
bounding all marginal inclusion probabilities for population units away from 0. As with the main result, the upper 
bound is injured by y. 

Proof. We proceed constructively to simplify the form of the expectations on the left-hand side of both Equa¬ 
tions 26 and 27 and follow with an application of Lemma 2 (and result 2.2) and Lemma 9 of Ghosal & van der 
Vaart (2007), which is used to establish the right-hand bound of Equation 27 (based on the existence of tests, 

K)- 

Fixing v, we index units that comprise the population with, U v = {1 Next, draw a single observed 

sample of n v units from U v , indexed by subsequence, 

{if £ U v : 8 V i e = 1, t= 1..... /H' } • Without loss of generality, we simplify notation to follow by indexing the 
observed sample, sequentially, with t = \.... , n v . 

We next decompose the expectation under the joint distribution with respect to population generation, Pq, and 
the drawing of a sample, P v , 

Suppose we draw P from some set B C S?. By Fubini, 
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J S v eA v 
PeB 


< / E Po 
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- 1 

-1 

X 

_i 
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(1 — 0«v) 


dU{P) 
dll (P) 
du.{p) 


< j P K (\ ~(t> nv )dU(P), 


(28) 


ny 
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-(X/) 

1 

K vt 

<5v 

P Fv (S v ) \dU(P) 

(29) 

i=\ 

IPo \ 



J 



(30) 

(31) 

(32) 


PeB 


where ^ Pp v (<M = 1 (Sarndal et al. 2003) and 5^ £ A v = |{5 * ( } J=1 Nv , 8f £ {0,1}| denotes that sample, 

SyG Ay 

drawn from the space of all possible samples, A v , which maximizes the probability under the population gener¬ 
ating distribution for the event of interest. The inequality in Equation 32 results from < 1 and f > 1. The 
conditional expectation of (1 — (j>„ v ) given 8y is denoted by. If (1 — @„ v ). 

lf/'C ■S > \ .S > x v . 



r Nv n K 

e p 0 ,p v 

/ n&f( x ^)(i-K) 

J i= l P 0 


PG0>\@> Ny 

* , 

[ P K (l-^> nv )dU(P)< J 


dU (P) 

dYi(p) = n(&\& Nv ) 


Pe^X^Ny 


Pe&X&Nv 


since (1 — 0 „ v ) < 1 . 

We next establish a bound for Pg* (1 — 0„ v ) on a sieve or slice. Let = {P £ p? Nv : re^ v < d% < 

2r£;v v } for integers, r. Under observed (XiS^,... ,~X.N v 8y N ) £ ST, by conditions (Al) and (A4) we have. 
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sup P s * (l-0„ v ) 

(33) 

sup P s *{\-(j) nv ) 

{Pe& Nv :rl;<d* v (P,P 0 )<2rt;} 

(34) 

(0 

< sup Ps*(l-<j) nv ) 

{p^N v --f y <d Nv [Pd b)<ff} 

(35) 

(“) ( Kn v r 2 E, 2 \ 

- exp l - r J’ 

(36) 


where the smaller range in (;), P g S 2> n v '■ < £/tv v (P,Po) < increases (1 — </>„ v ). The result in (ii) uses 

condition (A2) to obtain the result of Lemmas 2 and 9 in Ghosal & van der Vaart (2007) where we set | | /y'y. 

Finally, fixing some value for 5 > 0, set r = 2 / (5 for a given, for integers, £ > 0. Following the approach for 
bounding the sum over the slices in Wong & Shen (1995), let L be the smallest integer such that 2 2, 8 2 q 2 > 2 r, 
since < y/2y (by our definition of the pseudo Hellinger metric in Section 3.2). Then, 


Ef 


/ 


{Pe& Nv :d§ v (P,P 0 )>St;} 

i>w ( 

2 u Kn v S 2 E, 2 


n^(xA,)^n(p)(i-0„ v ) 

1=1 P o 


Nv D U 

TT Hr (X,-5vi) du (P)( 1 - <t > Nv ) 
/to ' B °’'^{/>6^ v :2^<^ v (P,P 0 )<2^5Of=1^0 


< y£ exp ( - 

e=o 


<2 yexp — 


r 

Kn v S 2 $ 2 


for n v sufficiently large such that K " v ^ ^ > 1. 
This concludes the proof. 


Lemma B.2. For every t, > 0 and measure If on the set. 


P 0 


B = iP: —Polog ( — ) < ^ 2 ,Po f log — ) <£ 2 


P o 


(37) 

(38) 

(39) 

(40) 


□ 


under the conditions (A2), (A3), (A4), and (A5), we have for every C > 0 and N v sufficiently large, 



where the above probability is taken with the respect to Pq and the sampling generating distribution, P v , jointly. 

The bound of “1” in the numerator of the result for Lemma 8.1 of Ghosal et al. (2000), is replaced with y+C 3 
for our generalization of this result in Equation 41. The sum of positive constants, y+C 3 , is greater than 1 and 
will be larger for sampling designs where the inclusion probabilities, {7T V ,}, express relatively higher gradients. 
Observing each finite population in a skewed fashion through the taking of an informative sample may only slow 
the rate of posterior contraction (as compared to contraction of the posterior distribution defined on the fully 
observed finite population). 
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Proof. By Jensen’s inequality. 


Nv -p K 


N v 


i°g / n^( x '^)^n(B)>X / log '—(XiS^duiP) 
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1 1 />6^ P ° 


= N v -¥ Nv ( \og P % dU{P), 
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Pe!P 


where we recall that the last equation denotes the empirical expectation functional taken with respect to the joint 
distribution over population generating and informative sampling. By Fubini, 
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PeZP 


where we, again, apply Fubini. 

Then, the probability statement in the result of Equation 41 is bounded (from above) by, 

PrJ<G£ v [ log^n(P)<-v/jVv£ 2 (l+C)- yi^P Q [ log E d U{P) 

I J po J P 0 

D e 3 s PeZ? 
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= Pr 
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pezP 


where we have again applied Fubini in the second inequality and also the bound for Pq log for P on the 

set B. 

We now apply Chebyshev and Jensen’s inequality to bound the probability. 
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where E/> 0 /> v (•) denotes the expectation with respect to the joint distribution over population generation and 
sampling (from that population) without replacement. We apply Jensen’s inequality in Equation 42b and use 
E (X 2 j > Var(A') in the third inequality, stated in Equation 42c, and drop the centering term in Equation 42d. 
We now bound the expectation inside the square brackets on the right-hand side of Equation 42d, which is taken 
with respect to this joint distribution. In the sequel, define £/ v = cr (X |,... ,X^ V ) as the sigma field of information 
potentially available for the N v units in population, U v . 
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min K V i 

K V iK V j 

L ieu v 




<^ 2 (7+C 3 ), 


for sufficiently large N v , where we have applied the condition for P € B for the first term of the last two in¬ 
equalities and conditions and (A4) and (A5) for the last inequality. We additionally note that n V ij = n V j when 
i = j, i,j £ U v . This concludes the proof. □ 
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